Segmentation of Speech Signals in Template-based Speech to Singing Conversion
نویسندگان
چکیده
Singing voice synthesis has found numerous applications in the entertainment industry over the recent years. The template-based personalized singing voice synthesis method is a new method of generating high quality singing voice, which synthesizes the singing voice by means of conversion from the narrated lyrics of a song. In this synthesis method, template speaking and singing voices are first recorded for the purpose of modeling the transformation from speech to singing. To improve its accuracy while reducing computational load, the template voices are divided into several segments so that fine alignment and subsequent conversion can be performed separately for each segment. To correctly generate singing voice, a new instance of speech has to be divided into similar segments, each containing the same stanza as in the template voices. In order to achieve this, an automatic segmentation method is proposed in this paper. The experiment results have shown that the segmentation of speech signals using our method is comparable to manual segmentation, with an accuracy of 98.24%. This performance is consistent even in the presence of noise.
منابع مشابه
Robust singing detection in speech/music discriminator design
In this paper, an approach for robust signing signal detection in speech/music discrimination is proposed and applied to applications of audio indexing. Conventional approaches in speech/music discrimination can provide reasonable performance with regular music signals but often perform poorly with singing segments. This is due mainly to the fact that speech and singing signals are extremely cl...
متن کاملWord segmentation in Persian continuous speech using F0 contour
Word segmentation in continuous speech is a complex cognitive process. Previous research on spoken word segmentation has revealed that in fixed-stress languages, listeners use acoustic cues to stress to de-segment speech into words. It has been further assumed that stress in non-final or non-initial position hinders the demarcative function of this prosodic factor. In Persian, stress is retract...
متن کاملA New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain
Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملP65: Speech Recognition Based on Bbrain Signals by the Quantum Support Vector Machine for Inflammatory Patient ALS
People communicate with each other by exchanging verbal and visual expressions. However, paralyzed patients with various neurological diseases such as amyotrophic lateral sclerosis and cerebral ischemia have difficulties in daily communications because they cannot control their body voluntarily. In this context, brain-computer interface (BCI) has been studied as a tool of communication for thes...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011